Pushing the Bounds for Matrix-Matrix Multiplication

نویسندگان

  • Tyler Michael Smith
  • Robert A. van de Geijn
چکیده

A tight lower bound for required I/O when computing a matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of phases needed to perform C := AB, where each phase is a series of operations involving S reads and writes to and from fast memory, and S is the size of fast memory. A lower bound on the number of phases was then determined by obtaining an upper bound on the number of scalar multiplications performed per phase. This paper follows the same high level approach, but improves the lower bound by considering C := AB + C instead of C := AB, and obtains the maximum number of scalar fused multiply-adds (FMAs) per phase instead of scalar additions. Key to obtaining the new result is the decoupling of the per-phase I/O from the size of fast memory. The new lower bound is 2mnk/ √ S− 2S. The constant for the leading term is an improvement of a factor 4 √ 2. A theoretical algorithm that attains the lower bound is given, and how the state-of-the-art Goto’s algorithm also in some sense meets the lower bound is discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Algebraic adjoint of the polynomials-polynomial matrix multiplication

This paper deals with a result concerning the algebraic dual of the linear mapping defined by the multiplication of polynomial vectors by a given polynomial matrix over a commutative field

متن کامل

Lecture 4 : Concentration and Matrix Multiplication

Today, we will continue with our discussion of scalar and matrix concentration, with a discussion of the matrix analogues of Markov’s, Chebychev’s, and Chernoff’s Inequalities. Then, we will return to bounding the error for our approximating matrix multiplication algorithm. We will start with using Hoeffding-Azuma bounds from last class to get improved Frobenius norm bounds, and then (next time...

متن کامل

Some inequalities involving lower bounds of operators on weighted sequence spaces by a matrix norm

Let A = (an;k)n;k1 and B = (bn;k)n;k1 be two non-negative ma-trices. Denote by Lv;p;q;B(A), the supremum of those L, satisfying the followinginequality:k Ax kv;B(q) L k x kv;B(p);where x 0 and x 2 lp(v;B) and also v = (vn)1n=1 is an increasing, non-negativesequence of real numbers. In this paper, we obtain a Hardy-type formula forLv;p;q;B(H), where H is the Hausdor matrix and 0 < q p 1. Also...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.02017  شماره 

صفحات  -

تاریخ انتشار 2017